SIMP59: Data Selection and Visualisation
7.5 credits VT25
This lecture introduces key concepts in data analysis using RMarkdown notebooks, focusing on working with data structures such as tables, networks, and nested data. Participants will learn how to import data frames, filter rows, and select relevant columns to refine their datasets. The session will cover handling missing values and identifying outliers to ensure data quality. We will explore the dplyr package, using the pipe operator to streamline data transformations, and discuss the principles of tidy data for efficient analysis and visualization.
We will also explore how to structure data analysis around research questions and variables, ensuring a clear focus on meaningful insights. We will introduce grouping and aggregation techniques in dplyr to summarize data effectively, allowing for comparisons across different categories. Participants will also learn how to reshape data by lengthening and widening formats to better align with analytical needs. The session will cover methods for exporting cleaned and processed data frames for further use.
Figure 1: In this section of the book, you’ll learn how to import, tidy, transform, and visualize data.
20 Spreadsheets 21 Databases 22 Arrow 23 Hierarchical data
12 Logical vectors 13 Numbers 14 Strings 15 Regular expressions 16 Factors 17 Dates and times 18 Missing values 19 Joins
Figure 2: The column names of pivoted columns become values in a new column. The values need to be repeated once for each row of the original dataset.
Data collection (nov 12)
Exam question 1
Data analysis (nov 26)
Exam question 2
Workshop 2, dec 2
References